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5 METHOD AND SYSTEM FOR PROVIDING A PROBE ARRAY 

CfflP DESIGN DATABASE 

CROSS-REFERENCE TO RELATED APPLICATIONS 

The present application claims priority from U.S. Prov. App. No. 60/053,842 
10 filed July 25, 1997, entitled COMPREHENSIVE BIO-INFORMATICS DATABASE, from 
U.S. Prov. App. No. 60/069,198 filed on December 11. 1997, entitled COMPREHENSIVE 
DATABASE FOR BIOINFORMATICS , and fiom U.S. Prov. App. No. 60/069,436, entitled 
GENE EXPRESSION AND EVALUATION SYSTEM, filed on December 1 1, 1997. The 
contents of all three provisional ^plications are herein incorporated by reference. 
1 S The subject matter of the present a^lication is related to the subject matter of 

the following three co-assigned applications filed on the same day as the present application: . 
GENE EXPRESSION AND EVALUATION SYSTEM (Attorney Docket No. 018547- 
035010), METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS 
DATABASE (Attorney Docket No. 018547-033810), METHOD AND SYSTEM FOR 
20 PROVIDING A POLYMORPHISM DATABASE (Attorney Docket No. 018547-033820), 
The contents of these three applications are herein incorporated by reference. 

BACKGROUND OF THE INVENTION 

The present invention relates to the collection and storage of information 
25 pertaining to chips for processing samples. 

Devices and computer systems for forming and using arrays of materials on a 
substrate are known. For example, PCT implication WO92/10588, incorporated herein by 
reference for all purposes, describes techniques for sequencing or sequence checking nucleic 
acids and other materials. Arrays for performing these operations may be formed in arrays 
30 according to the methods of, for example, the pioneering techniques disclosed in U.S. Patent 
No. 5,143,854 and U.S. Patent No. 5,571,639, both incorporated herein by reference for all 
purposes. 
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According to one aspect of the techniques described therein, an array of 
nucleic acid probes is fabricated at known locations on a chip or substrate* A fluorescently 
labeled nucleic acid is then brought into contact with the chip and a scanner generates an 
image file in d icating the locations where the labeled nucleic acids bound to the chip. Based 
5 upon the identities of the probes at these locations, it becomes possible to extract infonnation 
such as the monomer sequence of DNA or RNA. Such systems have been used to form, for 
example, armys of DNA that may be used to study and detect mutations relevant to cysdc 
fibrosis, the PS3 gene (relevant to certain cancers), HIV, and other genetic characteristics. 

Computer-aided techniques for monitoring gene expression using such arrays 

10 of probes have also been developed as disclosed in U.S. Patent Application No. 08/828,952 
and PCT publication No. WO 97/10365, the contents of which are herein incorporated by 
reference. Many disease states are characterized by difTerences in the eiq^ression levels of 
various genes either through changes in the copy number of die goietic DNA or through 
changes in levels of transcription (e,g., through control of initiation, provision of RNA 

1 5 precursors, RNA processing, eto.) of particular genes. For example, losses and gains of 
genetic material play an important role in malignant transformation and progression. 
Furthennore, changes in the eiqiression (transcription) levels of particular genes (e.g. , 
oncogenes or tumor suppressors), serve as signposts for fhe presence and progression of 
various cancers. 

20 As can be seen, the probe array chips are designed to answer questions about 

genomic items, herein defined to include genes, expressed sequence tags (ESTs), gene 
clusters, and EST clusters. Associated with infonnation about genomic items is genetic 
sequence infom[iation concerning the base sequences of genomic items. Probes are designed 
and selected for inclusion on a chip based on: 1) the identity of the genomic items to be 

25 investigated by the chip, 2) the sequence information associated with those genomic 

information, and 3) &e type of information sought, e.g., expression analysis, polymorphism 
analysis, eto. The interrelationships, however, among probes, genomic items, and sequence 
information are, however, extremely complex, greatly complicating the tasks of designing 
chips, effectively exploiting chips that have already been designed, and efficiently 

30 interpreting the infonnation generated by application of the chips. 

MoreovCT, it is contemplated that the operations of chip design, construction, 
and application will occur on a very large scale. The quantity of information related to chip 
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design to store and correlate is vast. What is needed is a system and method suitable for 
storing and organizing large quantities of information used in conjunction with the design of 
probe array chips. 

S SUMMARY OF THE INVENTION 

The present invention provides systems and method for organizing 
information relating to the design of polymer probe array chips including oligonucleotide 
array chips. A database model is provided which organizes information interrelating probes 
on a chip, genomic items investigated by the chip, and sequence information relating to the 

10 design of the chip. The model is readily translatable into database languages such as SQL. 
The database model scales to permit storage of information about large numbers of chips 
having complex designs. 

According to one aspect of the present invention, a computer-readable storage 
medium is provided. A relational database is stored on this medium. The relational 

1 S database includes: a probe table including a plurality of probe records, each of ihc probe 
records specifying a polymer probe for use in one or more polymer probe arrays, a sequence 
item table including a plurality of sequence item records, each of the sequence item records 
specifying a nucleotide sequence to be investigated in the one or more polymer probe arrays, 
wherem there is a many-to-many relationship between the probe records and the sequence 

20 item records. 

A further understanding of the nature and advantages of the inventions herein 
may be realized by reference to the remaining portions of the specification and the attached 
drawings. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 illustrates an overall system and process for forming and analyzmg 
arrays of biological materials such as DNA or RNA. 

Fig. 2A illustrates a computer system suitable for use in conjunction with the 
overall system of Fig. 1. 

30 Fig. 2B illustrates a computer network suitable for use in conjunction with the 

overall system of Fig. 1. 

Fig. 3 illustrates a key for interpreting a database model. 
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Fig. 4 illustrates a database model for maintaining information for the system 
and process of Fig. 1 according to one embodiment of the present invention. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
S Biological Material Analysis System 

One embodiment of the present invention operates in the context of a system 
for analyzing biological or other materials using arrays that themselves include probes ttiat 
may be made of biological matoials such as RNA or DNA. The VLSIPS™ and GeneChip™ 
technologies provide methods of making and using very large arrays of polymers, such as 
10 nucleic acids, on chips. See U.S. Patent No. 5,143,854 and PCT Patent Publication Nos. WO 
90/15070 and 92/10092, each of v^diich is hereby incorporated by reference for all purposes. 
Nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in a 
sample nucleic acid of interest (the "target** nucleic acid). 

It should be understood that the probes need not be nucleic acid probes but 
1 S may also be other polymers such as peptides. Peptide probes may be used to detect the 
concentration of peptides, polypeptides, or polymers in a sample. The probes must be 
carefully selected to have bonding afOnity to the compound whose concentration they are to 
be used to measure. 

Fig. 1 illustrates an overall system 100 for forming and analyzing arrays of 
20 biological materials such as RNA or DNA. A part of system 1 00 is a chip design database 
102. Chip design database 102 includes information about chip designs and the purposes of 
chips. Chip design database 102 facilitates large scale design, construction, and processing 
of chips. 

A chip design system 104 is used to design arrays of polymers such as 
25 biological polymers such as RNA or DNA. Chip design system 104 may be, for example, an 
£q>propriately programmed Sun Workstation or personal computer or workstation, such as an 
IBM PC equivalent, including q^propriate memory and a CPU. Chip design system 104 
obtains inputs fix)m a user regarding chip design objectives including characteristics of genes 
of interest, and other inputs regarding the desired features of the array. All of this 

i 

30 information may be stored in chip design database 102. Optionally, chip design system 104 
may obtain information regarding a specific genetic sequence of interest fix)m chip design 
database 102 or from external databases such as GenBank. The output of chip design system 
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1 04 is a set of chip design computer files in the form of, for example, a switch matrix, as 
described in PCT application WO 92/1 0092, and other associated computer files. The chip 
design computer files form a part of chip design database 102. Systems for designing chips 
for sequence determination and e^qiression analysis are disclosed in U.S. Patent No. 
S 5,571,639 and in PCT application WO 97/10365, the contents of \(diich are herein 
mcorporated by reference. 

The chip design files are input to a mask design system (not shown) that 
designs the lithographic masks used in the fabrication of arrays of molecules such as DNA. 
The mask design system designs the lithogr^hic masks used in the fiibrication of probe 

10 arrays. The mask design system generates mask design files that are then used by a mask 
construction system (not shown) to construct masks or other syntibesis patterns such as 
chrome-on-glass masks for use in the febrication of polymer arrays. 

The masks are used in a synthesis system (not shown). The synthesis system 
includes the necessary hardware and software used to &bricate arrays of polymers on a 

1 5 substrate or chip. The synthesis systmi mcludes a ligjht source and a chemical flow cell on 
i^ch the substrate or chip is placed. A mask is placed between the light source and die 
substrate/chip, and the two are translated relative to each other at appropriate times for 
deprotection of selected regions of the chip. Selected chemical reagents are directed through 
the flow cell for coi4)ling to deprotected regions, as well as for washing and other operations. 

20 The substrates fabricated by the synthesis system are optionally diced into smaller chips. 
The output of the synthesis system is a chip ready for application of a target sample. 

Information about the mask design, mask construction, probe array synthesis , 
and analysis systems is presented by way of background. A biological source 1 12 is, for 
^cample, tissue fiom a plant or animal. Various processing steps are qiplied to material 

25 fi-om biological source 1 12 by a sample preparation system 1 14. These stq)s may include 
e.g., isolation of mRNA, precipitation of the mRNA to increase concentration, eto, synthesis 
of cDNA fix)m mRNA, PGR amplification of fragments of interest The result of the various 
processing steps is a target ready for application to the chips produced by the synthesis 
system 110. 

30 The prepared samples include monomer nucleotide sequences such as RNA or 

DNA. When the sample is applied to the chip by a sample exposure system 1 16, the 
nucleotides may or may not bond to the probes. The nucleotides have been tagged with 
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fluoroscein labels to determine which probes have bonded to nucleotide sequences from the 
sample. The prepared samples will be placed in a scanning system 118. Scanning system 
1 18 includes a detection device such as a confocal microscope or CCD (charge-coupled 
device) that is used to detect the location y/bsrt labeled receptors have bound to the 
S substrate. The ou^ut of scanning system 118 is an image file(s)indicatuig,ui the case of 
fluorescein labeled receptor, the fluorescence intensity (photon counts or other related 
measurements, such as voltage) as a function of position on the substrate. These image files 
also form a part of chip design database 102. Since higher photon counts will be observed 
where the labeled receptor has bound more strongly to the array of polymers, and since the 

1 0 monomer sequence of the polymers on the substrate is known as a function of position, it 
becomes possible to determine the sequence(s) of polymer(s) on the substrate that axe 
complementary to the receptor. 

The image files and the design of the chips are input to an analysis system 
120 ihaty e.g., calls base sequences, or determines expression levels of genes or expressed 

15 sequence tags. The e3q)ression level of a gene or EST is herein understood to be the 

concentration withm a sample of mRNA or protein that would result fiom the transcription 
of the gene or EST. Such analysis techniques are disclosed in WO97/1036S, the contents of 
vAdch are herein incorporated by reference. Base calling techniques are described in WO 
95/1 1995, the contents of which are herein incorporated by reference. 

20 Chip design system 104, analysis system 120 and control portions of 

exposure system 1 16, sample preparation system 1 14, and scanning system 118 may be 
appropriately programmed computers such as a Sun workstation or IBM-compatible PC. An 
indq)mdmt computer for each system may perform the computer-implemented functions of 
these systems or one computer may combine the conqiuterized functions of two or more 

25 systems. Chieormorecomputersmay niaintaincUp design database 102 indepei^^ 
computers operating the systems of Fig. 1 or chip design database 102 may be fiilly or 
partially maintained by these computers. 

Fig. 2A depicts a block diagram of a host computer system 1 0 suitable for 
implementing the present invention. Host computer system 210 includes a bus 212 v/bich 

30 interconnects major subsystems such as a central processor 214, a system memory 216 
(typically RAM), an input/output (I/O) adapter 218, an external device such as a display 
screen 224 via a display ad^ter 226, a keyboard 232 and a mouse 234 via an I/O ad^ter 
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21 8, a SCSI host adapter 236, and a floppy disk drive 238 operative to receive a floppy disk 
240. SCSI host ad^ter 236 may act as a storage interface to a fixed disk drive 242 or a CD- 
ROM player 244 operative to receive a CD-ROM 246. Fixed disk 244 may be a part of host 
computer system 210 or may be separate and accessed through oth^inter&ce systems^ A 
S network inter&ce 248 may provide a direct connection to a remote server via a telephone 
link or to the Intemet Networic intei&ce 248 may also connect to a local area networic 
(LAN) or other network interconnecting many computer systems. Many other devices or 
subsystems (not shown) may be connected in a similar manner. 

Also, it is not necessary for all of the devices shown in Fig. 2A to be present 
10 to practice the present invention, as discussed below. The devices and subsystems may be 
interconnected in difierent ways from that shown in Fig. 2A. The operation of a computer 
system such as that shown in Fig. 2A is readily known in the art and is not discussed in detail 
in this qyplication. Code to implement the present invention, may be operably disposed or 
stored in computer-readable storage media such as system memory 216, fixed disk 242, CD- 
IS ROM 246, or floppy disk 240. 

Fig. 2B depicts a network 260 interconnecting multiple computer systems 
210. Network 260 may be a local area network (LAN), wide area network (WAN), etc. 
Bioinfonnatics database 102 and the computer-related operations of the other elements of 
Fig. 2B may be divided amongst computer systems 210 in any way with network 260 being 
20 used to communicate infonnation among the various computers. Portable storage media 
such as floppy disks may be used to cany information between computers instead of 
networic 260. 

Overall Description of Database 

25 Chip design database 1 02 is preferably a relational database with a complex 

internal structure. The structure and contents of chip design database 102 will be described 
with reference to a logical model that describes the contents of tables of the database as well 
as interrelationships among the tables. A visual depiction of this model will be an Entity 
Relationship Diagram (ERD) which includes entities, relationships, and attributes. A 

30 detailed discussion of ERDs is found in "ERwin version 3.0 Methods Guide" available from 
Logic Works, Inc. of Princeton, NJ, the contents of which are herein incorporated by 
reference. Those of skill in the art will appreciate that automated tools such as Developer 
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2000 available from Oracle will convert the ERD from Fig. 4 directly into executable code 
such as SQL code for creating and operating the database. 

Fig. 3 is a key to the ERD that will be used to describe the contents of chip 
design database 1 02. A representative table 302 includes one or more key attributes 304 and 
S one or more non-key attributes 306. Representadve table 302 includes one or more records 
whore each record includes fields corresponding to the listed attributes. The contents of the 
key fields taken togedier identify an individual record. In the ERD, each table is represrated 
by a rectangle divided by a horizontal line. The fields or attributes above the line are key 
while the fields or attributes below the line are non-key. An identifying relationship 308 

10 signifies that the key attribute of a parent table 310 is also a key attribute of a child table 312. 
A non-identifying relationship 3 14 signifies that the key attribute of a parent table 316 is also 
a non-key attribute of a child table 3 1 8. Where (FK) a^ipears in parenthesis, it indicates that 
an attribute of one table is a key attribute of another table. For both the non-identifying and 
the identifying relationship, one record in the parent table corresponds to one or more 

1 S records in the child table. 

At the highest level, chip design database 1 02 may be understood as 
providing a relational structure among genomic items, sequence items, and tiling items, as 
these terms are defined herein by use of example. Genes are characterized by their sequence, 
location on the genome, and function. Genomic items are herein defined as references to 

20 genes, gene clusters, expressed sequence tags (ESTs), and EST clusters by location and/or 
function but not by sequence. Sequence items are herein defined to be any oligonucleotide 
sequence or group of oligonucleotide sequences that may or may not by itself have 
biological meaning. A sequence item may be a long sequence of genomic DN A including 
more than one exon of biological significance. Altmmtively, an exon may include many 

25 sequence items. Also, a genomic item may have multiple associated sequence items or 
groiq)s of sequence items because of changes of sequence information stored in public 
genomic databases. Genomic items and sequence items are tracked separately by database 
1 02. There is a many-to-many relationship between genomic items and sequence items 
\^ch is captured by the internal structure of chip design database 102. 

30 Tiling items represent groupings of probes on a chip. A tiling item may be a 

pair of group of pahs of match and mismatch probes for an expression analysis chip. For 
sequencmg chips, a tiling item may be an atom including a group of probes designed to 
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detect a mutation or call a base at a particular base position. Tiling items are designed to 
interrogate sequence items, e.g., determine expression or call bases. However, a single tiling 
item may be used to interrogate more than one sequence item. For example, consider that a 
sequence item may identify a groiq> of sequences or a single sequence that is longer than the 
S length of a probe. Conversely, certain difiBcult sequences, e.g., sequences including long 
runs of the same base, may require more than one tiling item for interrogation. There is thus 
a many-to-many relationship between sequence item and tiling item and this relation is also 
captured by the internal design of chip design database 1 02. 

Tiling items include probe pair sets. A probe pair set represents a single 

1 0 sequence on a chip and include probe pairs. Chip design database 1 02 thus enables one to 
follow the various interrelationships described above and, e.g., associate a particular probe 
on a chip with the associated probe pair, probe pair set, tiling item, sequence item, genomic 
item, etc. The associated genomic item may be a gene cluster associated with a particular 
gene and an accession number within some biological database. All of these highly complex 

1 S relationships are preferably c^tured within chip design database 102. 

Chip design database 102 also preferably includes information such as the 
tiling items contained within any particular chip design. There also may be information 
about customer orders for a particular chip design including vA\st sequences were to be 
tested by a particular chip design, who ordered the chip design, etc. 

20 

Applications of Chip Design Database 

Chip design database 102 is a highly useful tool in designing and tracking 
existing chip designs. One application is storing intemiediate data about genomic items, 
sequence items, etc. that is input or generated during the course of generating a chip design. 

25 Scientists may request that particular genes or sequences be investigated. An mtermediate 
step in determining the chip design will be populating chip design database 102 with the 
mformation identifying genes or sequences to be investigated. Since chip design database 
102 preserves the information about the genomic items that are investigated by a particular 
chip design, it is aiso very useful in finding existing chip designs that are capable of 

30 servicing new requests. Also, chip design database 1 02 may be used after chip design is 
complete to answer questions about which genomic items and/or sequence items are 
interrogated by a particular probe or tiling item. 
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Database Model 

Fig. 4 is an entity relationship diagram (ERD) showing elements of chip 
design database 102 according to one embodiment of the present invention. Each rectangle 
in the diagram corresponds to a table in database 102. For eadi rectangle, the title of the 
5 table is listed above the rectangle. \)i^thin each rectangle, columns of the table are listed. 
Above a horizontal line within each rectangle are listed key columns, columns whose 
contents are used to identify individual records in the table. Below this horizontal line are 
the names of non-key colimms. The lines between the rectangles identify the relationships 
between records of one table and records of another table. First, the relationships among the 
10 various tables will be described. Then, die contents ofeach table will be discussed in detail. 

The tables of database 102 may be understood as belonging to different 
groups that relate to purpose. In Fig. 4, each table is denoted with a capital letter ''A" 
through "F^ to denote membership in a group. Qrovp A includes sequence and biological 
data. Group B includes design request information. Group C includes chip design 
1 S information such as which probes are included and how they are laid out Group D includes 
design specification information including information used in selecting probes. Group E 
includes information about compliance to customer contracts for chip design and production. 
Group F includes information about sequences requested but not included in a final chip 
design because of difficulty in selecting probes that would be effective in investigating them. 

20 The interrelationships and general contents of the tables of database 1 02 will 

be described first Then a chart will be presented listing and describing all of the fields of 
the various tables. 

A tiling item table 402 lists the various tiling items. Each record in tiling 
item table 402 identifies a tiling item for a particular chip design. Each tiling item has an 
25 associated tiling item type listed in a tiling item type table 406. Examples of tiling item type 
include ^'probe pairs'* which would identify a perfect match — mismatch probe pair or "'atom** 
^ch would mdicate a ffxmp of probes used for determining a mutation or calling a base at 
a particular base position. Each tiling itrai has one or more associated probes which are 
listed in a probe table 408. 

30 A tiling item may itself be an aggregation of other tiling items. A tiling 

composition table 409 includes records that associate aggregate tiling items with the tiling 
items they include. 
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Associated with each probe listed in probe table 408 is a probe lole record in 
a probe role table 410. The probe role record tells, e.g., if a particular probe in a perfect 
match rnismatch pair is itselfthe perfect match or the inismatch. Further associated with 
each probe is a probe specification record in a probe specification table 412. The probe 
5 specification record tells the length ofthe probe and Ae orientation of the probe. The 
orientation of the probe (sense or antisense) is identified within the probe specification 
record by reference to a record in a sense type table 414 which lists both orientations. 

A chip design table 416 lists chip designs. Associated with each chip design 
is a plurality of tiling items in tiling item table 402. Also associated with each chip design is 
10 a chip design type as listed in a chip design type table 41 8. Examples of chip design types 
are "expression analysis** or "mutation detection.** Each chip design may have many 
associated chip design names listed in a chip design name table 420. These names may 
include informal names used within the organization or formal names used in formal inter- 
organization communications. 

1 5 Chip designs may be aggregated into chip design sets which are listed in a 

chip composition table 422. Each record of chip composition table 422 identifies a chip 
design set which may include more than one chip design listed in chip design table 416. A 
chip design set may characterize a group of chips used together for a particular purpose such 
as identifying expression of oncogenes or tumor suppressors in humans. 

20 An exception table 424 lists sequences whose investigation was requested but 

for which optimal probes were not included in the design. Each excq)tion is associated with 
a particular combination of sequence and tiling item and has an associated exception type 
listed in an exception type table 426. One type of excq)tion, referred to as an "R*' exception 
is noted when preferred rules for probe selection have not been followed because they would 

25 not result in an adequate set of probes in the chip design for a particular sequence. An "S** 
exception denotes that the sequence is very similar to another sequence and that sequences 
had to be grouped together to find acceptable probe sets so that certain probes interrogate 
more than one sequence. An "T* exception indicates that the probe set is incomplete, 
although the probes that are included in the set interrogating the sequence are of high quality. 

30 A "B** exception indicates that all probe selection rules have been dropped and that the 

probes are of low quality. A "G** indicates that the sequence overlaps with another sequence. 

There is a sequence item table 426 that lists all the sequence items of chip 
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design database 1 02. Associated with each listed sequence item is a sequence type from 
sequence type table 428. Examples of sequence type include "^sequence" and "groiq) of 
sequences." A sequence composition table 430 is used to aggregate sequences into groiq>s of 
sequences. Each group listed in sequence composition table 430 has associated sequences in 
S sequmce item table 426. 

There is a sequence derivation table 432 which lists derivations used to 
transform one sequence listed in sequence item table 426 into another. Each derivation has a 
derivation type listed in a derivation type table 434. Examples of derivation types include 
"removal of ambiguities,*' or "change in GenBank infomtiation." An allele table 436 lists 
10 polymorphisms for some of the sequences listed in sequence item table 432. 

A sequmce overiap table 438 lists overlaps between sequences of sequence 
item table 426. These overlaps are important to know for the probe selection process. The 
overls^s are determined by a process known as blast comparison. The result of a blast 
comparison is a description of the match quality between the compared sequences. This 
1 S match quality is stored in sequence overlap table 438. 

During the chip design process, sequences may be the basis for creating tiling 
items. Sequence information is also the basis for pruning the set of probes that axe included 
in a chip design. Pruning is a step of probe selection. Objectives of pruning may include: 
assuring that no probe is a diq)licate of another probe in a probe pair set, assuring that no 

20 probe is the same as any other probe in a chip or set of chips, or assuring that a probe is not a 
duplicate of any probe that would be used to interrogate a set of sequences larger than the set 
investigated by a chip or set of chips. For example, it may be useful once the entire human 
genome is known to prune probe sets so that no probe is used that would interrogate more 
than one sequence in the genome. The more that is pruned against, the higher the quality of 

25 the resulting chip design is since ambiguity in analysis results is greatly reduced. To 
facilitate pruning, chip database 102 provides a pruning set table 440 which lists pruning 
sets. Each pruning set has an associated chip design in chip design table 416. A pruning 
m^ table 442 lists correlations between particular sequence items and pruning sets and 
implements the many-to-many relation that exists between sequence item table 426 and 

30 pruning set table 440. 

A genomic item table 444 lists genomic items. Each listed genomic item may 
be a gene or EST or an aggregate of genes or ESTs. A genomic composition table 446 lists 
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the relationships between aggregations of genes and/or ESTs and their components, A 
genomic name table 448 lists names of genomes. Each name may apply to more tha?^ one 
genome. Similarly, each genome may have more than one name. A genomic name map 
table 450 implements the many-to-many relationships between genomes and names. 

S A genomic type table 452 lists tfie various types of genome such as ''gene,** 

"gene cluster " ••EST " and •*EST cluster." Each genomic item in genomic item table 444 has 
an associated genomic type in genomic type table 452. A species table 454 lists the species 
associated with the genomic items. Each genomic item in genomic item table 444 has an 
associated species in species table 454. 

10 It is often useful to know the position of a genomic item in a chromosome. A 

chromosome table 456 lists various chromosomes. Each record in a chromosome map table 
458 indicates which chromosome a genomic item is located in and where on the 
chromosome the genomic item would be found. 

It is also useful to store information about database references for genomic 
15 items. The records of biological database reference table 460 each mclude information as 
would be foimd in one database about one genomic item. The databases themselves are 
listed in a biological database table 462. Representative databases include GenBank, Entrez, 
and TIGR. 

Genomic items are themselves related to one another by functional homology. 
20 Genomic items may be groiq}ed by the functions performed by proteins that result fiom their 
e}^ression. A homology function table 464 lists different functions in a celL A homology 
map table 466 lists associations between the listed homologies and genomic items listed in 
genomic item table 444. 

Genomic items listed in genomic item table 444 may also have associated 
25 armotation information. An aimotation table 468 lists annotations for genomic items. Each 
record in an armotation map table 470 associates an annotation and a genomic item. A 
comment foimd in an annotation may be backed up by a citation to the literature listed in a 
citation table 472. 

Genomic items may be grouped into sets corresponding to projects where 
30 each project has a particular investigative objective. For example one project may 

investigate genes relating to high blood pressure while another project investigates genes 
relating to breast cancer. Typically, a project will be the impetus for designing a chip or a set 
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of chips. A project table 476 lists such projects. A project map table 478 lists associations 
between projects and genomic items and like the other m^ tables implements a many-to* 
many relationship between genomic items and projects. 

The chip design process may originate with a project assignment vAdch 
5 specifies genomic items, or may alternatively originate with a design request that specifies 
sequences to be mterrogated by probes on the chip. A design request table 480 lists such 
design requests. Each design request may have many associated design request items listed 
in a design request item table 482. The records of design request item table 482 each 
identify a requested sequence item. 

10 All requested sequences may or may not fit in the final chip design. Ifa 

requested sequence is not found m a chip design, this is recorded in a rej ect map table 484. 
Each record in reject nuq> table 484 identifies a sequence that was requested to be included m 
a particular chip design but left out Each such reject record has an associated reject type 
selected fiom the types listed m a reject type table 486. 

1 S Associated with each design request or project is a customer as listed in a 

customer table 488. Each customer may have one or more associated design requests, 
annotations, or projects as listed in tables 480, 468, and 476 respectively. A customer may 
also be the source of one or more sequence items as found in a sequence item table 426. A 
source map table 490 implraients the many-to-many relationship between sequence items 

20 and customers. Each customer is associated with a site as recorded in a site table 492. 

There may also be associations between design requests and projects. 
Projects may have one or more associated design requests and design requests may have one 
or more associated projects. A design m^ table 493 lists associations between design 
requests and projects. 

25 Companies may have one or more sites and are listed in a company table 494. 

Biological databases listed in biological database table 462 may be proprietary to companies 
listed in company table 494. By providing a relationship between these two tables, chip 
design database 1 02 allows the chip designer to keep track of genomic item information that 
should be kept propriety to particular orderers. Source map table 490 similarly assists in 

30 maintaining the necessary confidentiality for customer-originated sequence information. A 
company may request specific probes to be included in a chip. These requests are listed in a 
probe request table 491 . An order limits table 493 lists the contractual limitations that ^ply 
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to chip design work to be done for particular companies. For example, a company may be 
limited to investigate a certain number of genes per chip, or be limited to request a certain 
number of probes per chip. 

A communications table 496 lists communications between the chip designer 
5 and customo* about a particular design request Each design request may have one or more 
associated communications. Each commimication listed in communications table 496 has an 
associated communications type as listed in a communications type table 498. Different 
commimication types may correspond to different stages in the process. For example, the 
different types may include "chip request," "sequences iqxiated,*' "sequences incomplete," 
10 etc. 

A classification table 500 lists classifications of item requests. Such 
classifications represent functional hierarchies. Classifications may include, e.g., tissue 
types or protein family names. A classificadon map table 502 associates item requests with 
classifications. 

1 5 The many-to-many relationship between genomic items and sequence items is 

implemented by a sequence map table 504 which lists associations between genomic items 
and sequence items. The many-to-many relationship between sequence items and tiling 
items and thus probes is implemented by a sequence used map 506 which lists associations 
between sequence items and tilmg items. A control map table 508 similarly unplements a 

20 many-to-numy relationship between sequence items and tiling types. 

Database Contents 

The contents of the tables introduced above will now be presented in greater 
detail in the following chart 

25 



TABLE 


FIELD 


CONTENTS 


CDtblChromosome 








CDfldCbromosQinelD 


Identification number for chromosome. 




CDfldQuomosomeNaiDe 


Name of chromosome. 


CDtblQuomosomeMa 
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TABLE 


FIELD 


CONTENTS 




GENOMIC JtemlXFIO 


Reference to gaiomic item in genomic 
item table. 




CDfldChromosomelXFK) 


Reference to duomosome table. 




CDfidChroMapCytoeemcLocadon 


Cytogenic location. 




CDfldChroMapGeneticLocation 


Genetic location. 




CDildChroMq>PhysicaiLocadon 


Physical location of genomic item on 
chromosome. 


GENOMIC NAME 








GENOMIC IDOEl.l) 


Reference to genomic item table. 




GENOMIC Name 


Name of genome. 




CDfldGenomicNameLong 


Longer version of genomic name. 


SPECIES 








SPECIES ID 


Species identification. 




SPECIES Type 


Type of species. 




SPECIES CommcmName 


Conunon name of species. 


CDtblGeneNameMap 








GENOMIC nXFK) 


Reference to genomic name table. 




GENOMIC ItcmlDCFK) 


Reference to genomic item table. 


CDtblHomoloRyMap 








GENCOMP_Elemem(FK) 


Points to genomic item in genomic item 
table. 




GENCOMP ARsregatelD 


Identifies aggregation of genomic items. 


GENOMIC TYPE 








GENOMICTYPE ID 


Identifier for genomic type. 




GENOMICTYPE Name 


Name of genomic type. 




CDfldgenomictypedescription 


Description of genomic type. 


GENOMIC ITEM 








GENOMIC ItemID 


Genomic item identifier. 




SPECIES ID(FK) 


Reference to species table. 




GENOMIC ItemId(FKXIEI.l) 


Reference to genomic type table. 


> 

CDtblHomologyMap 








CDfldHomologynXFK) 


Homology identifier. 




GENOMIC itemldrPIO 


Reference to eenomic item table. 
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TAmJS 


FIELD 


CONTENTS 




CDfUHomologyFiiiicti 
on 










CDfldHomoIosylD 


Homology identifier. 






CDfldHomologyName 


Name of homology. 


5 




CDfldHomoIogy Description 


Description of homology. 




BIOIjOGICAL_DB_R 
EFERENCE 










BIODBEFJD 


Identifier for biological database 
reference. 






GENOMIC itemllXFK) 


Reference to goiomic item table. 


10 




BIODB ID(FI0(AK1.2) 


Reference to biological dababase table. 






BIODBREF_Valiic(AKl.l) 


Reference value, e.g., accession 
number. 






BIODBREF Descripdoii 


Description of database reference. 




BIOLOGICAL DB 










BIODBREF ID 


Biological database identifier. 


15 




COMPANY ID(FK) 


Reference to company table. 






BIODB Name 


Name of database. 






BIODB ReferenceType 


Type of reference. 






CDfldBioDBWebSite 


Website for database. 




ANNOTATION 






20 




ANNOTATION ID 


Annotation identifier. 






ANNOTATION Descripdon 


Description of annocation. 




ANNOTATION MAP 










ANNOTATION ID(FK) 


Reference to annotation table. 






GENOMIC ItemID(FK) 


Reference to genomic item table. 


25 




CUSTOMER ID(FK) 


Reference to customer table 






CITATION IDOFK) 


Referenct to citation table 






ANNOTATIONMAP Ratog 


Indication of quality of annotation. 




CITATION 










CITATION ID 


Citation identifier. 


30 




CITATION Source 


Soiuce of citation. 




SEQUENCE ITEM 










SEQUENCE ITEM 


Seouence identifier. 



wo 99/05574 



PCT/US98/15456 



18 





TABLE 


FIELD 


CONTENTS 






SEQTYPE ID(FK) 


Reference to sequence type table. 






SEQUENCE Sequence 


Sequence (may be very long field). 




SEQUENCE MAP 










SEQUENCE ID(FK) 


Reference to sequence item table. 


5 




GENOMIC ItemnXFIOaEl.l) 


Reference to genomic item table. 




CDAlAllele 










CDfldAUelelD 


Allele identifier. 






SEQUENCE ID(FK) 


Reference to sequence item table. 






CDfldAlleleOfiiset 


Position of polymorphism 


10 




CDfldAlIeleBase 


Base defined by polymorphism. 




E/198 










SEQUENCE ID(FK)(IE2.1) 


Reference to sequence item table. 






CHIP DesignlDflFKKEl.l) 


Reference to chip design table. 






REJECTTYPE ID(FK) 


Reference to reject type table. 


15 


E/200 










REJECTTYPE ID 


Reject type identifier. 






REJECTTYPE Name 


Name of reject type. 






REJECTTYPE Descripti(m 


Description of reject type. 




SEQUENCE TYPE 






20 




SEQTYPE ID 


Sequence type identifien 






SEQTYPE Name 


Name of sequence type. 






CDfldseqtypedescriptiQii 


Description of sequence type. 




SEQUENCE 








DERIVATION 






25 




SEQUENCE nXFK) 


Origmal sequence. 






SEQCOMP EtemeDtnXFK) 


Derived Sequence. 






CDfldDeriveTypenXFK) 


Reference to derivation type table 






CDfldSeqDeriveAlias 


Suffix attached to name of derived 
sequence. 






CDfidSeqDeiiveOffiset 


Offeet between original sequence and 
dmved sequence. 


30 


CDtblDerivation Type 










CDfldDeriveTypelD 


Derivation type idnitifier. 






CDfldDeriveName 


Name of derivation tvoe. 
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TABLE 


FIELD 


CONTENTS 






CDfldDeriveDescriptioii 


Descrhnion of derivation type. 






String 


Suffix associated with derivaticm type. 




SEQUENCE 
OVERLAP 






5 




SEQUENCE ID (FK) 


First seouence compared. 






SEQSEOOVERLAP ID2 


Second sequence compared. 






SEQOVERLAP_MalchPercem 


Percentage match hetween con^aied 
sequences. 






SEQOVERLAP^MatchSequeDce 


Sequencing common between two 
compared sequences. 






CDfidSeqOverla^f&et 


Of&et value if second oonq>ared 
sequences an of&et from first con^ared 
sequence. 


10 


SEQUENCE 
COMPOSITION 










SEQCOMPJElfiiiieiitID(FK) 


Identifier of sequence included in 
aggreRate. 






SEQCOMP AggregatelD 


Identifier of agKresate of sequ^ices. 




PRUNING MAP 






15 




PRUNINGSET ID(FK) 


Pnnuns set identifier. 






SEQUENCE nXFK) 


Reference to sequence item table. 




PRUNING SET 










PRUNINGSET ID 


Pruning set identifier. 






PRUNINGSET NAME 


Name of pruning set. 


20 




PRUNINGSET Description 


Description of pruning set. 


n CHIP DESIGN 










CHIP DesignID 


Chip design identifier. 






COMPANY ID(FK) 


Reference to company table. 






CHIP TypeID(FK) 


Reference to chip type table. 


25 




CHIPFeatureSize 


X dimension size of chip feanires, e.g., 
25 or 50 /mi. 






CHIPMasklD 


Mask identifier associated with mask 

for chip 




CHIP FeatureCoimiY 


Feature size and Y direction. II 
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TABLE 


FIELD 


CONTENTS 






CHIP PartNumber 


Part nmnber to identify chip. 






CHIP Code 


Another chip designator. 






CHIP GridX 


Number of cells in the X direction. 






CHIPSizeUiiit 


Units used for feature size, typically 
micnms. 


5 




CHIP GridY 


Number of cells in die Y direction. 






Chip Descriptiaii 


Description of chip. 






PRUNINGSET UXFK) 


Reference to pruning set table. 




CfflP DESIGN TYPE 










CHIPTYPE ID 


Chiptype identifier. 


10 




CHIPTYPE Name 


Name of chip type. 






CDfldchiptypedescription 


Description of chip type. 




CDtblChipDesignNam 
e 










CHIP DesignID(FK) 


Reference to chip design table. 


15 




CDfldChipDesiKiiName 


Name of chip design. n 




CHIP^^COMPOSmO 
N 




1 

1 






CHIP DesignlDOFK) 


Identifier of chip set. H 






CHIPCOMP mementlD 


Identifier of chip in chip set. 


20 


TILING ITEM 











TILING ID 


Tiling item identifier. | 






CHIP DesiKnID(FK) 


Reference to chip design table. 






TILING Tyi>eID(FK) 


Reference to tiling type table. 




TILING TYPE 






25 




TILINGTYPE ID 


Tiling type identifier. 






TIUNGTYPE Name 


Name of tiling type. 






TILINGTYPE DesType 


Code for tiling type. 






TILINGTYPE Set 


Description of tilmg type. 




CONTROL MAP 






30 




TILINGTYPE DXFK) 


Referraice to tiling type table. 






SEQUENCE nXFK) 


Reference to sequence item table. 




TIUNG^COMPOSra 
ON 
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TABLE 


FIELD 


CONTENTS 






TILECOMP_AggregateId(FK) 


Identifier for aggregation of tiling 
items. 






TII£CONfP_meinentID(FK) 


Identifier for tiling item within 
aggregation. 




PROBE 










PROBE ID 


Probe identifier. 


5 




PROBEROLE ID(FK) 


Reference to probe role table. 






TILING ID 


Reference to tiling item table. 






PROBE Sequence 


Probe sequence. 






PROBESPEC ID(FK) 


Probe specification identifier. 






PROBE X 


X position of probe on chip. 


10 




PROBE Y 


Y position of probe on chip. 






Number 


Sequence position of probe 




PROBE ROLE 










PROBEROLE ID 


Probe role identifier. 






PROBEROL^Name 


Name of probe roll, e.g., perfect match 
or mismatch. 


15 




PROBEROLE DesType 


Code representing probe roll name. 






PROBEROLControi 


Indicates whether probe is a control 
probe. 




PROBE SPEC 










PROBESPEC ID 


Probe specification identifier. 






SENSETYPE_ID(FK)(AKL3) 


Sense type indication, e.g., sense or 
antisense; reference to sense type table. 


20 




PROBESPEC LaiRth(AKl.l) 


Length of probe. 






I'tvUourl^^OUDAItWluOn 


Position at whidi mismatch is made for 






(AK1.2) 


a mismatch probe. 




SENSE TYPE 










SENSETYPE ID 


Sense type identifier. 






SENSETYPE_Naiiie 


Name of sense type, e.g., sense or 








antisense. 


25 




SENSETYPE E>escription 


Longer version of sense or antisense. 






SENSETYPE_Sign 


Positive or negative, depending on 
whether sense or antisense. 
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TABLE 


FIELD 


CONTENTS 




SEQUENCE USED 










SEQUENCE IDfFK) 


Reference to sequence item table. 






TILING nXFK) 


Reference to tiling item table. 




CRTTERIAN 






5 




EXCEPTION ED 


Exception identifier. 






SEQUENCE nXFK) 


Reference to sequence item table. 






EXCEPTIONTYPE ID(FK) 


Reference to exception type table. 






TILING nXFK) 


Reference to tiling item table. 




CRITERIAN2 






10 




EXCEPTIONTYPE ID 


Exception type identifier. 






CRITERIUMTYPE Extension 


Suffix to identify criterium type. 






EXCEPTIONTYPE Name 


Name of criterium type. 






CRITERIUMTYPE Description 


Description of criterium type. 






CRITERIUM_Cluster 


Whether criterium type is pan of a 
chister. 


15 


CUSTOMER 










CUSTOMER ID 


Customer identifier. 






CUSTOMER SitellXFK) 


Reference to site table. 






CUSTOMER ContactName 


Name of custon^r contact. 






CUSTOMER PhoneNumber 


Phone number of customer contact. 


20 




Cofdpersonemail 


E-mail address of customer contact. 






COfldPersonLastName 


Last name of customer contact. 




SITE 










SITE ID 


Site identifier next row. 






SITE Addieas 


Address of site. 


25 




SITE PhoneNumber 


Phone number of site. 






COMPANY nXFK) 


Reference to company table. 




CONfPANY 










COMPANY ID 


Company identifier. 






COMPANY Name 


Name of company. 


30 


PROBE REQUEST 










PROBEREO ID 


Probe request identifier. 






COMPANY IDfFJO 


Reference to company table. 
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TABLE 


FIELD 


CONTENTS 




PROBEREQ_CliipID 


Onp that probe request is made for, 
reference to chip design table. 




PROBER£Q_Probeia 


Identifier of probe that was requested, 
reference to probe table. 


OUTER LIMITS 








COMPANY ] 






Reference to company table. 




LIMIT GenesPerChip 


Maximimi number of genes per chip. 




UMIT ProbeRetiuestPerChip 


Maximum number of probes per chip. 


CDtblSoiuceMap 








SEQUENCE ID(FK) 


Reference to seouence item table. 




CUSTOMER nXFK) 


Reference to customer table. 




CDfldSouxceMapDateAcquired 


Date source map acquired. 




CDfldSourceMapAnnealing Temp 


Annealing temperature for sequence. 




CDfldSoiirceMapConfidence 


Confidence level in sequence map. 




CDfldSourceMapStartMaterial 


Pertains to method of creation of map. 




String 


Comment. 


PROJECT 








PROJECT ID 


Project identifier. 




CUSTOMER 






Reference to customer table. | 




PROJECT DateCrcated 


Date of project creation. | 




PROJECT Description 


Description of proiect. 


PROJECT MAP 




1 




PROJECT nXFK) 


Reference to project table. 




GENOMIC ItemliKFIO 


Reference to genomic item table. 


COtblDesignReouest 








COfldDesignRequestID 


Design request identifier. 




CUSTOMER 






Customer identifier. 




CHIP Desienl 






Reference to chip design table. 




COMMTYPEJDCOfldDesignReq 
uestDateRecdved 


Date request received. 




COMMTYPE^NameCOfldDesign 
RequestPO 


Purchase order number. 




CofldcomCOfldDesignRequestGen 
esPerChiD 


Number of gei^s per chq> requested. 
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TABI£ 


FIELD 


CONTENTS 




COfldDesignRequest 
ProbesPerGene 


Number of probes per gene requested. 




COfldDesignRequestPbatuxeSIze 


Fieature size requested, e.g., 
ZSorSO/im 




COfldDesignRequestFeatureCouot 


How many features will fit on chip. 


• 


COfldDesiRnRequestDescription 


Description of requested diip. | 




COfldDesignRequestlDstnictions 


Customer instructions. 




String 


Orientation of target sequences that are 
to be read with die dup. H 


DESIGN MAP 








PROJECT ID(FK) 


Reference to project table. 




COfldDesignRequesdD(FK) 


Reference to design request table. 


CX)NfNfUNICATIONS 








COMM ID 


Communications identifier. 




COfldDesignRequestID(FK) 


Reference to design request table. 




COMMTYPE ID(FK)aEl.l) 


Reference to communication type table. 




COMM Date 


Date of communication. | 




COMM Descriptioii 


Description of communication. 


COMM TYPE 








COMMTYPE ID 


Communication type identifier. 




COMMTYPE Name 


Name of communication type. 




Cofldconimtypedescriptiaa 


Description of communication type. 


ITEM REQUESTED 








ITEM Requestedid 


Requested item identifier. 




COfldDesignRequesaD(FK) 


Reference to design request table. 




SEQUENCE ID(FK) 


Reference to sequence item table. 




ITEM_Slart 


Permissible starting pomt in submitted 
sequence. 




ITEM Stop 


Permitted stopping point in sequence. 




ITEM Alias 


Another name for specified sequence. 




ITEM Description 


Description of sequence. 




ITEM_Reverse 


Whether sequence is to be reversed 
before placement on chip. 




iniDort Oualifier 


Import Qualifier?? 
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TABI£ 


FIELD 


CONTENTS 




Cofldltemrequestedprobeperitem 


Override to number of probes per gene 
in design request table. 




CoflditeiDrequesteddlereverse 


Whether particular sequence is to be 
tiled in sense or antisense direction. 


Classification 








CLASSIHCATION ID 


Classification identifier. 


1 


CLASS Kevword(AKl.l) 


Description of classification. 


CLASS MAP 








ITEM RequestedID(FK) 


Reference to item request table. 




CLASSIFICATION ID(FK) 


Reference to classification table. 




CLASSMAP_Gniq> 


Grotq>ing together of classification 

specified by customer. 



10 



It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of this 
application and scope of the appraded claims. For sample, tables may be deleted, contents of 
multiple tables may be consolidated, or contents of one or more tables may be distributed among 
more tables than described herein to improve query speeds and/or to aid system maintenance. 
Also, the database architecture and data models described herein are not limited to biological 
^plications but may be used in any application. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference. 
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WHAT IS OLAIMED ISr 

1 LA computer-readable storage medhim having stored thereon: 

2 a relational database comprising: 

3 a probe table including a plurality of probe records, each of said probe records 

4 specifying a polymer probe for use in one or more polymer probe arrays; 

5 a sequence item table including a plurality of sequence item records, each of said 

6 sequence item records specifying a nucleotide sequence to be investigated in said one or more 

7 polymer probe arrays; and 

8 wherein there is a many-to-many relationship between said probe records and 

9 said sequence item records. 

1 2. The medium of claim 1 i^erein said relational database furdier 

2 comprises: 

3 a tiling item table including a plurality of tiling item records, each of said tiling 

4 item records having an aggregation relationship with said probe records so that each tiling item 

5 record has many associated probe records. 

1 3. The medium of claim 1 wherein said relational database further 

2 conq>rises: 

3 a genomic item table including a plurality of genomic item records, each of said 

4 genomic item records specifying a genomic item to be investigated by said one or more polyiher 

5 probe arrays; and 

6 wherein there is a many to many relationship between genomic item records and 

7 sequence item records. 

1 4. The medium of claim 1 wherein said relational database further 

2 comprises: 

3 a chip design table including a plurality of chip design records, each of said chip 

4 design records specifying a design of a chip including a subset of said plurality of probe records. 
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1 S. A computer implemented mediod for operating a relational database 

2 comprising: 

3 creating a probe table including a plurality of probe records, each of said probe 

4 records specifying a polymer probe for xise in one or more polymer probe arrays; 

5 creating a sequence item table including a plurality of sequence item records, 

6 each of said sequence item records specifying a nucleotide sequence to be investigated in said 

7 one or more polymer probe arrays; 

8 storing data in said probe table and said sequence it^ table; and 

9 wherein there is a many-to-many relationship between said probe records and 
1 0 said sequence item records. 

1 6. The method of claim S further comprising the step o£ 

2 creating a tiling item table including a plurality of tiling item records, each of said 

3 tiling item records having an aggregation relationship with said probe records so that each tiling 

4 item record has many associated probe records. 

1 7. The method of claim 5 further comprising the step of: 

2 creating a genomic item table including a plurality of genomic it^ records, each 

3 of said genomic item records specifying a genomic item to be investigated by said one or more 

4 polymer probe arrays; and 

5 wherein there is a many to many relationship between genomic item records and 

6 sequence item records. 

1 8. The method ofclaim 5 further comprising the step of: 

2 creatmg a chip design table including a plurality of chip design records, each of 

3 said chip design records specifying a design of a chip including a subset of said plurality of 

4 probe records. 
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1 9. A computer system comprising: 

2 a processor and 

3 a storage medium storing a relational database accessible by said processor, said 

4 storage medium having stored Aereon: 

5 a relational database comprising: 

6 a probe table including a plurality of probe records, each of said probe records 

7 specifying a polymer probe for use in one or more polymer probe anays; 

8 a sequm:e item table including a pluiaUtyofsequence item recoids, each of s^ 

9 sequence item records specifying a nucleotide sequence to be investigated in said one or more 

10 polymer probe arrays; and 

1 1 wherein there is a many-to-many relationship between said probe records and 

12 said sequence item records. 
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